Structure of E. coli DnaC helicase loader

Introduction
The DnaC protein in E. coli is part of the DNA polymerase complex of proteins, including the replisome and primosome, that replicate genomic DNA. DnaC is believed to be involved in loading the DnaB helicase, which separates the strands of the DNA so that each can be copied by the DnaG polymerase.

3D Structure: Homology Model
No empirical (X-ray crystallographic) 3D structure for the E. coli DnaC protein is available in December, 2008, although one or more may become available shortly. In view of this, a homology model was constructed using the automated Swiss-Model server. Swiss-Model deemed the only usable template for the homology model to be the crystal structure of a "putative primosome component" from Streptococcus pyogenes (2qgz) determined by the Northeast Structural Genomics Consortium, "to be published". The main features of the tertiary structure of this homology model are confirmed by the structural similarity of a crystal structure (also "to be published") of the DnaC helicase loader of Aquafex aeolicus (3ec2). Therefore, the fold and topology of this model are likely to be correct. However, because the sequence identity between the template and the target E. coli DnaC is only 19%, there may be significant errors in the registration of the E. coli DnaC sequence with the structure. Further, the positions of sidechains in homology models are generally unreliable.

We thank the authors of 2qgz and 3ec2 for releasing their structure data at the Protein Data Bank prior to full publication. Without these data, the homology model of E. coli DnaC would not have been posssible in December, 2008.

Viewing and Download
In addition to the interactive scenes below, the homology model can be downloaded from the Proteopedia server as Dnac_from_2ggz_a.pdb, or viewed and explored in FirstGlance in Jmol. (Note that the PDB filename contains a typographical error: 2ggz should have been 2qgz.)

Conclusions from Homology Model


The homology model (restore initial scene ) represents 75% of the full length E. coli DnaC sequence, omitting 54 N-terminal residues, and 8 C-terminal residues. Several surface loops in the model (shown translucent white) have high uncertainty, since these are missing in the template (see below). Note that the homology model is somewhat unreliable about which residues are actually on the surface vs. largely buried, and hence the conclusions below are tentative.

Charge Distribution
Cationic patch : The model displays a patch, about 20 x 30 &Aring;, containing six positively charged amino acids and no negative charges. The patch is near the bottom of this scene. Such a patch would be suitable for interaction with e.g. anionic phosphates in ATP/ADP or DNA. The positive charges are Arg55, Arg59, Arg63, Arg126, Lys128, and Arg237. His66 and His114 are near one end of this patch. /
 * Backbone trace.
 * Charge overlaid on backbone trace.

Evolutionary Conservation
Two prominent patches of highly conserved residues are apparent.


 * Large conserved patch. The larger of the two conserved patches is adjacent to the positively charged patch, and includes Arg237 and His114, which are highly conserved (ConSurf level 9). The larger patch also includes highly conserved surface residues Asn113, Asp169, Glu170, Asn203. (There are also three highly conserved surface glycines in this patch, not listed because surface glycines are typically conserved for reasons of secondary structure rather than function.) These highly conserved residues (ConSurf level 9) are flanked by several conserved residues (ConSurf level 8), including Ile62, Asn73, and Thr110. Conserved (level 8) residues are uncommon elsewhere on the surface (except for the other conserved patch).


 * Small conserved patch. This patch consists of highly conserved (ConSurf level 9) residues Arg216, Asp219, Arg220, flanked by conserved (ConSurf level 8) residue Asp189.


 * The same two conserved surface patches are observed on the 2.7 &Aring; crystal structure of DnaC from Aquifex aeolicus (3ecc), where the larger conserved patch is the binding site for ATP/ADP.

Homology Model Construction


Steve Sandler kindly provided the following sequence for DnaC from E. coli:

 MKNVGDLMQR LQKMMPAHIK PAFKTGEELL AWQKEQGAIR SAALERENRA MKMQRTFNRS GIRPLHQNCS FENYRVECEG QMNALSKARQ YVEEFDGNIA SFIFSGKPGT GKNHLAAAIC NELLLRGKSV LIITVADIMS AMKDTFRNSG TSEEQLLNDL SNVDLLVIDE IGVQTESKYE KVIINQIVDR RSSSKRPTGM LTNSNMEEMT KLLGERVMDR MRLGNSLWVI FNWDSYRSRV TGKEY 

This sequence (245 amino acids) was submitted to Swiss Model, which generated the homology model shown here (restore initial scene ) using 2qgz chain A as a template, which has 18.6% sequence identity. Apparently Swiss Model used predicted secondary structure to help in the sequence alignment, but details are not clear to me. The homology model represents residues 55-237 (183 residues representing 75% of DnaC), shown in boldface in the above sequence. Because of the low sequence identity, this model may well contain significant errors, especially in registration.

Swiss Model has apparently used the temperature value field in the PDB file to indicate regions that are highly unreliable, namely the regions that are red when the model is colored by temperature. These regions are shown as translucent white in the initial scene (using the Jmol command select temperature >50). The uncertainty in three of these regions is explained by gaps in the template model (see below). Although the details of these regions are even more uncertain than other regions, it seems likely that these loops are on the surface, if the homology model turns out to be substantially correct.

As indicated above, Swiss-Model found only one usable template for homology modeling, despite the existence of an empirical 3D crystal structure for DnaC with a slightly higher sequence identity.

Sources: Swiss-Model [sm]; targetdb.pdb.org [tdb]; pdb.org using a BLAST search [pdbB], or a FASTA search [pdbF]. (a) Lengths not in parentheses are for crystallographic results, and are counts of amino acids with coordinates; they exclude disordered residues ("gaps" in the model). Lengths in parentheses are for the target sequence of DnaC, or sequences of the crystallized protein (from SEQRES in the PDB file).

Gaps in the Template Model
 The template was 2QGZ (initial scene ). The portion of the template used was Glu107-Arg300. Only the amino-terminal 6 residues were not used as template (translucent). Note that there are <scene name='User:Eric_Martz/Sandbox_4/2qgz/5'>three loops in this segment of the template that lack coordinates due to disorder in the crystal (marked with spacefilled alpha-carbon atoms).

The missing loops are 202-205 (NGSV), 226-231 (EQATSW), and 268-275 (TIKGSDET). These gaps, which occur between the residues marked /\ below, were apparently ignored in making the model, which has a continuous main chain.

Below is the alignment produced by Swiss Model, used in making the 3D model. Vertical bars for identity were inserted by hand (I may have missed some). |    | |  |     || TARGET    55             R TFNRSGIRPL HQNCSFENYR VECEGQMNAL SKARQYVEEF 2qgzA    100   qkqaais--e riqlvslpks yrhihlsdid vnnasrmeaf saildfveqy TARGET                    sssss    h h             hhhhhhh hhhhhhhhh 2qgzA              hhh  h   sss    h h             hhhhhhh hhhhhhhhh

|        | ||   ||     | |              | TARGET    96    DGN-IASFIF SGKPGTGKNH LAAAICNELL L-RGKSVLII TVADIMSAMK 2qgzA    148   psaeqkglyl ygdmgigksy llaamahels ekkgvsttll hfpsfaidvk TARGET               ssss ss     hhh hhhhhhhhhh h h   ssss sshhhhhhh 2qgzA                ssss ss     hhh hhhhhhhhhh hh    ssss sshhhhhhh

||  |  | ||                | TARGET    144   DTFRNSGTSE EQLLNDLSNV DLLVIDEIGV QTESKYEKVI INQIVDRRSS 2qgzA    198   naiske --eidavknv pvlilddiga vrde-v lqvilqyrml /\                         / \ TARGET                         hhh     ssssss               hhhhhhhhhh 2qgzA                       hh   h    ssssss               hhhhhhhhhh

|    |                 ||| |  |               | TARGET    194   SKRPTGMLTN SNMEEMTKLL ---GERVMDR MRLGNSLWVI FNWDSYR 2qgzA    247   eelptfftsn ysfadlerkw awqakrvmer vr-ylarefh leganrr- /\ TARGET         h  ssssss    hhhhh          hhhh hh  ssssss s         2qgzA           h  ssssss    hhhh           hhhh hh hh ssss s

Below is the sequence with ATOM records (coordinates) from 2QGZ, numbered 100-300, showing the gaps as "...". This sequence listing was used to locate the positions marked /\ above. 1 .......... .......... .......... .......... ..........   51 .......... .......... .......... .......... .........Q   101 KQAAISERIQ LVSLPKSYRH IHLSDIDVNN ASRMEAFSAI LDFVEQYPSA 151 EQKGLYLYGD MGIGKSYLLA AMAHELSEKK GVSTTLLHFP SFAIDVKNAI 201 S....KEEID AVKNVPVLIL DDIGA..... .VRDEVLQVI LQYRMLEELP

251 TFFTSNYSFA DLERKWA... .....WQAKR VMERVRYLAR EFHLEGANRR (Copied from Protein Explorer's sequence display.)

Below is the alignment of full-length DnaC with 2QGZ according to TargetDB (see above). Note that the 2QGZ structure begins at residue 100, and so the homology model begins with residue 55 of DnaC, indicated with &gt; below. ID:  DR58   Center: NESGC E-value: 0.00028 Identity: 19.737%

10       20        30        Query                        MKNVGDLMQRLQKMMPAHIKPAFKTGEELLAWQKEQGA Q+ Q  P++I  +++    +     + + Subjct EVASFISQHHLSQEQINLSLSKFNQFLVERQKYQLKDPSYIAKGYQPILAMNEGYADVSY 40       50        60        70        80        90

40       50    >   60        70        80        90        Query  IRSAALERENRAMKMQRTFNRSGIRPLHQNCSFENYRVECEGQMNALSKARQYVEEF-DG +++ L + ++   +++ ++  ++   +++  + +  V+  ++M+A+S   ++VE++ ++ Subjct LETKELVEAQKQAAISERIQLVSLPKSYRHIHLSDIDVNNASRMEAFSAILDFVEQYPSA 100      110       120       130       140       150

100      110       120        130       140       150      Query  NIASFIFSGKPGTGKNHLAAAICNELLLR-GKSVLIITVADIMSAMKDTFRNSGTSEEQL + ++ + G  G GK++L AA+ +EL  + G S+ ++   ++   +K+++ N++++EE Subjct EQKGLYLYGDMGIGKSYLLAAMAHELSEKKGVSTTLLHFPSFAIDVKNAISNGSVKEE-- 160      170       180       190       200

160      170        180       190       200       210     Query  LNDLSNVDLLVIDEIGV-QTESKYEKVIINQIVDRRSSSKRPTGMLTNSNMEEMTK ++ ++NV +L++D+IG+ Q+ S +  +++ I++ R   + PT + +N ++ ++ + Subjct IDAVKNVPVLILDDIGAEQATSWVRDEVLQVILQYRMLEELPTFFTSNYSFADLERKWAT 210      220       230       240       250       260

220      230       240     Query  LLG---ERVMDRMRLGNSLWVIFNWDSYRSRVTGKEY + G      +RVM+R+R Subjct IKGSDETWQAKRVMERVRYLAREFHLEGANRR 270      280       290       300

Confirmation of Homology Model By Related Structures
<applet load='2chg9-63_aligned_with_dnac_model.pdb' size='400' frame='true' align='right' caption='Structural alignment.' scene='User:Eric_Martz/Sandbox_6/2qgz_3ec2_aligned_pdb/1'/>

When the PDB is searched with the DnaC sequence, the best match (December, 208) is 23% sequence identity with 183 amino acids in the DnaC helicase loader of Aquifex aeolicus, 3ec2 and 3ecc. In order to find whether these structures have the same fold as the template (2qgz with 19% sequence identity to E. coli DnaC) used for the homology model, <font color="#3030ff">2qgz was structurally aligned (<scene name='User:Eric_Martz/Sandbox_6/2qgz_3ec2_aligned_pdb/1'>restore initial alignment scene ) with <font color="#ff0000">3ec2. The similarity of folds lends considerable confidence to the homology model of E. coli DnaC.

The second best sequence-identity hit in the PDB is 39% identity with 54 amino acids (positions 9-63 of chain A) of replication factor C (2chg), which align with 72-124 of DnaC. When the above homology model of DnaC (made with template 2QGZ) is <scene name='User:Eric_Martz/Sandbox_4/2chg9-63_aligned_with_dnac_mod/1'>structurally aligned with residues 9-63 of 2CHG, 43 alpha carbons (out of 54) aligned with RMS deviation 2.3 &Aring;. <font color="#ff0000">Residues 21-63 of 2CHG aligned with <font color="#3030ff">residues 80-124 of the DnaC homology model. (Non-aligned portions are pastel.) This result adds firther confidence to this region of the homology model, since the structural alignment of 2CHG:A21-63 occurred in the same range as the sequence alignment (which was 72-124 in DnaC).

Download the above structural alignments:
 * 2qgz_3ec2_aligned.pdb
 * 2chg9-63_aligned_with_dnac_model.pdb

Crystal Structure of DnaC Is "In The Pipeline"
A sequence-based search at the international Structural Genomics TargetDB reveals that the closest completed structure is 2qgz, the one chosen by SwissModel as a template. (3ec2 and 3ecc were not determined by a structural genomics project.) A number of crystal and NMR structures have sequence identities up to 37% but over shorter stretches, and with higher E values.

Diffraction data have been obtained (but the solved structure not yet deposited) for a Listeria monocytogenes sequence of 307 residues, pI 5.2, with an E value of 1.6e-05, though only 21% sequence identity. Diffraction-quality crystals (but not yet diffraction data) have not been obtained for any sequence with such a low E value.

E. coli DnaC (245 residues, pI 9.4) has been crystallized by RIKEN Structural Genomics Initiative (Japan), but the crystals may not be of diffraction quality. It has been cloned, expressed as a soluble protein, and purified (but not yet crystallized) by 3 Structural Genomics Groups (RIKEN Structural Genomics Initiative (Japan), Montreal-Kingston Bacterial Structural Genomics Initiative, Midwest Center for Structural Genomics), as have several proteins with >40% sequence identity.

Thus, there is reason for optimism that either a crystal structure, or a more suitable template for homology modeling, will be forthcoming soon.

Additional Resources
For additional information, see: DNA Replication, Repair, and Recombination For additional information, see: Nucleic Acids